Search for: All records

Creators/Authors contains: "Brand, Lodewijk"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A linear primal–dual multi-instance SVM for big data classifications

https://doi.org/10.1007/s10115-023-01961-z

Brand, Lodewijk; Seo, Hoon; Baker, Lauren Zoe; Ellefsen, Carla; Sargent, Jackson; Wang, Hua (August 2023, Knowledge and Information Systems)

Multi-instance learning (MIL) handles data that is organized into sets of instances known as bags. Traditionally, MIL is used in the supervised-learning setting for classifying bags which contain any number of instances. However, many traditional MIL algorithms do not scale efficiently to large datasets. In this paper, we present a novel primal–dual multi-instance support vector machine that can operate efficiently on large-scale data. Our method relies on an algorithm derived using a multi-block variation of the alternating direction method of multipliers. The approach presented in this work is able to scale to large-scale data since it avoids iteratively solving quadratic programming problems which are broadly used to optimize MIL algorithms based on SVMs. In addition, we improve our derivation to include an additional optimization designed to avoid solving a least-squares problem in our algorithm, which increases the utility of our approach to handle a large number of features as well as bags. Finally, we derive a kernel extension of our approach to learn nonlinear decision boundaries for enhanced classification capabilities. We apply our approach to both synthetic and real-world multi-instance datasets to illustrate the scalability, promising predictive performance, and interpretability of our proposed method.
more » « less
Full Text Available
Scalable Multi-Instance Multi-Shape Support Vector Machine for Whole Slide Breast Histopathology

https://doi.org/10.1109/ICKG55886.2022.00036

Seo, Hoon; Brand, Lodewijk; Barco, Lucia Saldana; Wang, Hua (November 2022, 2022 IEEE International Conference on Knowledge Graph (ICKG))

Histopathological image analysis is critical in cancer diagnosis and treatment. Due to the huge size of histopathological images, most existing works analyze the whole slide pathological image (WSI) as a bag and its patches are considered as instances. However, these approaches are limited to analyzing the patches in a fixed shape, while the malignant lesions can form varied shapes. To address this challenge, we propose the Multi-Instance Multi-Shape Support Vector Machine (MIMSSVM) to analyze the multiple images (instances) jointly where each instance consists of multiple patches in varied shapes. In our approach, we can identify the varied morphologic abnormalities of nuclei shapes from the multiple images. In addition to the multi-instance multi-shape learning capability, we provide an efficient algorithm to optimize the proposed model which scales well to a large number of features. Our experimental results show the proposed MIMSSVM method outperforms the existing SVM and recent deep learning models in histopathological classification. The proposed model also identifies the tissue segments in an image exhibiting an indication of an abnormality which provides utility in the early detection of malignant tumors.
more » « less
Full Text Available
Scaling multi-instance support vector machine to breast cancer detection on the BreaKHis dataset

https://doi.org/10.1093/bioinformatics/btac267

Seo, Hoon; Brand, Lodewijk; Barco, Lucia Saldana; Wang, Hua (June 2022, Bioinformatics)

Abstract Motivation Breast cancer is a type of cancer that develops in breast tissues, and, after skin cancer, it is the most commonly diagnosed cancer in women in the United States. Given that an early diagnosis is imperative to prevent breast cancer progression, many machine learning models have been developed in recent years to automate the histopathological classification of the different types of carcinomas. However, many of them are not scalable to large-scale datasets. Results In this study, we propose the novel Primal-Dual Multi-Instance Support Vector Machine to determine which tissue segments in an image exhibit an indication of an abnormality. We derive an efficient optimization algorithm for the proposed objective by bypassing the quadratic programming and least-squares problems, which are commonly employed to optimize Support Vector Machine models. The proposed method is computationally efficient, thereby it is scalable to large-scale datasets. We applied our method to the public BreaKHis dataset and achieved promising prediction performance and scalability for histopathological classification. Availability and implementation Software is publicly available at: https://1drv.ms/u/s!AiFpD21bgf2wgRLbQq08ixD0SgRD?e=OpqEmY. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
A multi-instance support vector machine with incomplete data for clinical outcome prediction of COVID-19

https://doi.org/10.1145/3459930.3469552

Brand, Lodewijk; Baker, Lauren Zoe; Wang, Hua (August 2021, Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health)
null (Ed.)
In order to manage the public health crisis associated with COVID-19, it is critically important that healthcare workers can quickly identify high-risk patients in order to provide effective treatment with limited resources. Statistical learning tools have the potential to help predict serious infection early-on in the progression of the disease. However, many of these techniques are unable to take full advantage of temporal data on a per-patient basis as they handle the problem as a single-instance classification. Furthermore, these algorithms rely on complete data to make their predictions. In this work, we present a novel approach to handle the temporal and missing data problems, simultaneously; our proposed Simultaneous Imputation-Multi Instance Support Vector Machine method illustrates how multiple instance learning techniques and low-rank data imputation can be utilized to accurately predict clinical outcomes of COVID-19 patients. We compare our approach against recent methods used to predict outcomes on a public dataset with a cohort of 361 COVID-19 positive patients. In addition to improved prediction performance early on in the progression of the disease, our method identifies a collection of biomarkers associated with the liver, immune system, and blood, that deserve additional study and may provide additional insight into causes of patient mortality due to COVID-19. We publish the source code for our method online.
more » « less
Full Text Available
A Linear Primal-Dual Multi-Instance SVM for Big Data Classifications

https://doi.org/10.1109/ICDM51629.2021.00012

Brand, Lodewijk; Baker, Lauren Zoe; Ellefsen, Carla; Sargent, Jackson; Wang, Hua (December 2021, 2021 IEEE International Conference on Data Mining (ICDM))

Multi-instance learning (MIL) is an area of machine learning that handles data that is organized into sets of instances known as bags. Traditionally, MIL is used in the supervised-learning setting and is able to classify bags which can contain any number of instances. This property allows MIL to be naturally applied to solve the problems in a wide variety of real-world applications from computer vision to healthcare. However, many traditional MIL algorithms do not scale efficiently to large datasets. In this paper we present a novel Primal-Dual Multi-Instance Support Vector Machine (pdMISVM) derivation and implementation that can operate efficiently on large scale data. Our method relies on an algorithm derived using a multi-block variation of the alternating direction method of multipliers (ADMM). The approach presented in this work is able to scale to large-scale data since it avoids iteratively solving quadratic programming problems which are generally used to optimize MIL algorithms based on SVMs. In addition, we modify our derivation to include an additional optimization designed to avoid solving a least-squares problem during our algorithm; this optimization increases the utility of our approach to handle a large number of features as well as bags. Finally, we apply our approach to synthetic and real-world multi-instance datasets to illustrate the scalability, promising predictive performance, and interpretability of our proposed method. We end our discussion with an extension of our approach to handle non-linear decision boundaries. Code and data for our methods are available online at: https://github.com/minds-mines/pdMISVM.jl.
more » « less
Full Text Available
Learning Semi-Supervised Representation Enrichment Using Longitudinal Imaging-Genetic Data

https://doi.org/10.1109/BIBM49941.2020.9313310

Seo, Hoon; Brand, Lodewijk; Wang, Hua (December 2020, 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM))
null (Ed.)
Alzheimer's Disease (AD) is a progressive memory disorder that causes irreversible cognitive decline. Recently, many statistical learning methods have been presented to predict cognitive declines by using longitudinal imaging data. However, missing records that broadly exist in the longitudinal neuroimaging data have posed a critical challenge for effectively using these data in machine learning models. To tackle this difficulty, in this paper we propose a novel approach to integrate longitudinal (dynamic) phenotypic data and static genetic data to learn a fixed-length biomarker representation using the enrichment learned from the temporal data in multiple imaging modalities. Armed with this enriched biomarker representation, as a fixed-length vector per participant, conventional machine learning models can be used to predict clinical outcomes associated with AD. We have applied our new method on the Alzheimer's Disease Neruoimaging Initiative (ADNI) cohort and achieved promising experimental results that validate its effectiveness.
more » « less
Full Text Available
Factor-Bounded Nonnegative Matrix Factorization

https://doi.org/10.1145/3451395

Liu, Kai; Li, Xiangyu; Zhu, Zhihui; Brand, Lodewijk; Wang, Hua (May 2021, ACM Transactions on Knowledge Discovery from Data)
null (Ed.)
Nonnegative Matrix Factorization (NMF) is broadly used to determine class membership in a variety of clustering applications. From movie recommendations and image clustering to visual feature extractions, NMF has applications to solve a large number of knowledge discovery and data mining problems. Traditional optimization methods, such as the Multiplicative Updating Algorithm (MUA), solves the NMF problem by utilizing an auxiliary function to ensure that the objective monotonically decreases. Although the objective in MUA converges, there exists no proof to show that the learned matrix factors converge as well. Without this rigorous analysis, the clustering performance and stability of the NMF algorithms cannot be guaranteed. To address this knowledge gap, in this article, we study the factor-bounded NMF problem and provide a solution algorithm with proven convergence by rigorous mathematical analysis, which ensures that both the objective and matrix factors converge. In addition, we show the relationship between MUA and our solution followed by an analysis of the convergence of MUA. Experiments on both toy data and real-world datasets validate the correctness of our proposed method and its utility as an effective clustering algorithm.
more » « less
Full Text Available
Integrating Static and Dynamic Data for Improved Prediction of Cognitive Declines Using Augmented Genotype-Phenotype Representations

Seo, Hoon; Brand, Lodewijk; Wang, Hua; Nie, Feiping (January 2021, Proceedings of the AAAI Conference on Artificial Intelligence)
null (Ed.)
Alzheimer’s Disease (AD) is a chronic neurodegenerative disease that causes severe problems in patients’ thinking, memory, and behavior. An early diagnosis is crucial to prevent AD progression; to this end, many algorithmic approaches have recently been proposed to predict cognitive decline. However, these predictive models often fail to integrate heterogeneous genetic and neuroimaging biomarkers and struggle to handle missing data. In this work we propose a novel objective function and an associated optimization algorithm to identify cognitive decline related to AD. Our approach is designed to incorporate dynamic neuroimaging data by way of a participant-specific augmentation combined with multimodal data integration aligned via a regression task. Our approach, in order to incorporate additional side-information, utilizes structured regularization techniques popularized in recent AD literature. Armed with the fixed-length vector representation learned from the multimodal dynamic and static modalities, conventional machine learning methods can be used to predict the clinical outcomes associated with AD. Our experimental results show that the proposed augmentation model improves the prediction performance on cognitive assessment scores for a collection of popular machine learning algorithms. The results of our approach are interpreted to validate existing genetic and neuroimaging biomarkers that have been shown to be predictive of cognitive decline.
more » « less
Full Text Available
Task Balanced Multimodal Feature Selection to Predict the Progression of Alzheimer’s Disease

https://doi.org/10.1109/BIBE50027.2020.00040

Brand, Lodewijk; O'Callaghan, Braedon; Sun, Anthony; Wang, Hua (October 2020, 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE))
null (Ed.)
The social and financial costs associated with Alzheimer's disease (AD) result in significant burdens on our society. In order to understand the causes of this disease, public-private partnerships such as the Alzheimer's Disease Neuroimaging Initiative (ADNI) release data into the scientific community. These data are organized into various modalities (genetic, brain-imaging, cognitive scores, diagnoses, etc.) for analysis. Many statistical learning approaches used in medical image analysis do not explicitly take advantage of this multimodal data structure. In this work we propose a novel objective function and optimization algorithm that is designed to handle multimodal information for the prediction and analysis of AD. Our approach relies on robust matrix-factorization and row-wise sparsity provided by the ℓ2,1- norm in order to integrate multimodal data provided by the ADNI. These techniques are jointly optimized with a classification task to guide the feature selection in our proposed Task Balanced Multimodal Feature Selection method. Our results, when compared against some widely used machine learning algorithms, show improved balanced accuracies, precision, and Matthew's correlation coefficients for identifying cognitive decline. In addition to the improved prediction performance, our method is able to identify brain and genetic biomarkers that are of interest to the clinical research community. Our experiments validate existing brain biomarkers and single nucleotide polymorphisms located on chromosome 11 and detail novel polymorphisms on chromosome 10 that, to the best of the authors' knowledge, have not previously been reported. We anticipate that our method will be of interest to the greater research community and have released our method's code online.11Code is provided at: https://github.com/minds-mines/TBMFSjl
more » « less
Full Text Available
Joint Multi-Modal Longitudinal Regression and Classification for Alzheimer’s Disease Prediction

https://doi.org/10.1109/TMI.2019.2958943

Brand, Lodewijk; Nichols, Kai; Wang, Hua; Shen, Li; Huang, Heng (January 2020, IEEE Transactions on Medical Imaging)

Alzheimer’s disease (AD) is a serious neurodegenerative condition that affects millions of individuals across the world. As the average age of individuals in the United States and the world increases, the prevalence of AD will continue to grow. To address this public health problem, the research community has developed computational approaches to sift through various aspects of clinical data and uncover their insights, among which one of the most challenging problem is to determine the biological mechanisms that cause AD to develop. To study this problem, in this paper we present a novel Joint Multi-Modal Longitudinal Regression and Classification method and show how it can be used to identify the cognitive status of the participants in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort and the underlying biological mechanisms. By intelligently combining clinical data of various modalities (i.e., genetic information and brain scans) using a variety of regularizations that can identify AD-relevant biomarkers, we perform the regression and classification tasks simultaneously. Because the proposed objective is a non-smooth optimization problem that is difficult to solve in general, we derive an efficient iterative algorithm and rigorously prove its convergence. To validate our new method in predicting the cognitive scores of patients and their clinical diagnosis, we conduct comprehensive experiments on the ADNI cohort. Our promising results demonstrate the benefits and flexibility of the proposed method. We anticipate that our new method is of interest to clinical communities beyond AD research and have open-sourced the code of our method online.C
more » « less
Full Text Available

« Prev Next »